WHAT IS CLAIMED IS: 



1. A method for finding an optimal set of data association rules in automated data 
diagnosis of the data characterizing an entity, comprising the steps of: 

establishing a computer system for automated data diagnoses; 

creating in said computer system a relation R containing the data A = 
(Ai,. . .,A n , A n+ i,. . . A n+m ), where n, m > 1 , said data characterizing the entity to be diagnosed, 
said data being represented by outcome attributes A n+ i,. . . A n+m and by diagnosable attributes 
A|,...A„, said outcome attributes determining whether the entity is desirable or not, and said 
diagnosable attributes determining the reason of why the entity is desirable or not; 

specifying a selection outcome condition (D) determined by a user which 
includes strictly outcome attributes selected from said A n+ i,. . . A n+m attributes; 

specifying at least one diagnosable selection condition (C) which includes 
diagnosable attributes selected by the user from said Ai,...A n attributes; 

specifying selection condition constraints (S) for said diagnosable selection 
conditions to meet, said selection condition constraints (S) including minimal acceptable 
confidence conf(c), minimal acceptance support sup(c) and maximum order of said 
diagnosable selection condition; 



specifying a number of fringes of interest, F°, FL-F, F 1+i , 



wherein 



F°={C\U P (C) = 0} 
F M =\Cup{c)^ (J 
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and wherein the fringe F° represents an optimal set of selection conditions 
with regard to a combination of respective ones of said specified support, said specified 
confidence and said specified "simpler-than" ordering, the fringe F 1 represents the set of 
diagnosable selection conditions that are less desirable than the fringe F°, and the fringe F i+1 
represents the set of diagnosable selection conditions that are less desirable than fringe F 1 ; 

computing said optimal fringes F°, FV-.F, F i+1 , and 

computing a compact set of said optimal fringes to eliminate redundant 
conditions, said compact set representing the optimal set of data association rules. 
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2. The method of Claim 1, further comprising the steps of: 

specifying a "simpler-than" ordering (> simpler), including a set of said 
diagnosable selection conditions (C) which are simpler than a predetermined diagnosable 
selection condition; 

specifying a Data Diagnosis Objective (DDO) by defining components 
thereof including: 

(a) evaluation domain (ED) providing for measurement of quality 
of said diagnosable selection conditions, 

(b) a partial ordering (c) of said evaluation domain, specifying 
which diagnosable selection condition in said evaluation domain are better than others, and 

(c) a mapping function (f) that maps said diagnosable selection 
conditions to said evaluation domain, such that Ci>C 2 =>f(C 2 )c f (CO, 

wherein C\ and C 2 are diagnosable selection conditions; and 
specifying a semi-equivalent relation (A), on said diagnosable selection 
conditions to determine similarity thereof. 
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3. The method of Claim 1, wherein said entity is a product of a manufacturing 
process. 

4. The method of Claim 1, wherein said entity is a financial prediction process. 

5. The method of Claim 2, further comprising the step of combining said 
diagnosable conditions C to form a diagnosable selection condition SC. 

6. The method of Claim 5, further comprising the step of restricting said 
diagnosable selection conditions to tight diagnosable selection conditions T, 

wherein said diagnosable selection condition SC is tight if for each 
diagnosable condition / < A i <u in said diagnosable selection condition SC, 

(^(^OascaoCK)* 0 ) M^m^ascadCK) * 0) where a is a relational selection operator, 
and 

wherein Ai is a diagnosable attribute, and 1, u, e dom (Aj) are values defined 

by the user. 
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7. The method of Claim 6, wherein said step of computing said optimal fringes 
further comprises the steps of: 

creating a condition graph by enumerating a set of tight selection conditions 
T satisfying said selection condition constraints S, and 

evaluating support and confidence defined by the user. 

8. The method of Claim 6, wherein if l=u, then A* = 1. 

9. The method of Claim 6, wherein l=u if said A* is a non-numeric diagnosable 
attribute. 

10. The method of Claim 1, wherein said outcome attributes and said diagnosable 
attributes in said relation R include numeric or non-numeric attributes. 
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1 1 . The method of Claim 6, wherein said 1 < A* < v is a diagnosable selection 
condition. 



12. The method of Claim 6, wherein if d and C 2 are diagnosable conditions, then 
Ci a C 2 is a diagnosable selection condition. 



13. The method of Claim 6, wherein the confidence conf(c) of said diagnosable 
condition C is defined as 

card{a c {R)) 



14. The method of Claim 6, wherein said support sup(c) of said diagnosable 
condition C is defined as 

sup(C) = carrf(o- CAD (tf)). 
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15. The method of Claim 2, wherein said "simpler-than" ordering is defined as: 

(a) Ci simpler C 2 if each diagnosable attribute occurring in Ci also occurs 

in C 2 ; or 

(b) Ci > simpler 2 C 2 if Q has the same or fewer distinct attributes occurring 
in said Q, than in said C 2 ; or 

(c) Q > simpler 3 C 2 if said Ci has fewer distinct attributes than said C 2 , or 
said Cj has the same number of distinct attributes as C 2 but fewer numeric diagnosable 
attributes. 



16. The method of Claim 2, farther comprising the steps of: 

applying standard metrics to said DDO for comparing diagnosable selection 
conditions including either from a group thereof, consisting of: chi-squared value, 
confidence, conviction, entropy gain, loplace, lift, gain, gini, and support. 
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17. The method of Claim 2, wherein said fringes are defined independent of said 

DDO. 

18. The method of Claim 17, further comprising the step of applying a plurality of 
distinct said DDO to said set of fringes avoiding recomputing said set of fringes. 

19. The method of Claim 2, wherein said semi -equivalence relations A is a 
distance-based semi-equivalence relation. 

20. The method of Claim 2, wherein said semi-equivalence relation A is an 
attribute distance threshold based semi-equivalence relation, the method further comprising 
the steps of: 

defining L C a and U ca to be the lower and upper bounds, respectively, of a 
respective diagnosable attribute A in said diagnosable condition C, and defining a 
diagnosable selective condition Ci , as semi-equivalent to a diagnosable selective condition 
C 2 if: 

a. the set of diagnosable attributes appearing in said Ci is equivalent 
to the set of diagnosable attributes appearing in said C 2 , 
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b. for each numeric diagnosable attribute, A, appearing in said Q 

and said C 2 

d[L Cu ,L c J<eA 
d { U c u >U c J<eA, 

where the s A values are constants that differ based on the 

diagnosable attribute A; and 

c. for each non-numeric attribute A, appearing in C x and C 2 , L Ci = 

v. 
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21 . The method of Claim 2, further comprising the steps of: 

defining a subset SF of said set F of the optimal fringes as said compact 
representation of said set F with regard to said A , if: 

a. for each diagnosable selective condition sc e F,3sc'e CF such that 

scAsc' ; 

b. if sceF and sceCF , then Bsc'gCF such that 
scAsc' a (-n(f(sc f ) c f(sc)) v (f(sc') = f(sc)), and 

c. there is no strict subset CF' of CF satisfying said conditions (a) 

and (b). 
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22. A system for automated data diagnosis, comprising: ^ 
a computer system; 

means in said computer system for storing data characterizing an 
entity to be diagnosed, 

means for forming a relation R containing said data, said relation R 
containing the data A = (Ai,...,A n , An + i,...An +m ), where n, m > 1, the data being represented 
by outcome attributes A n+ i,...An+m and diagnosable attributes Ai,...A n , said outcome 
attributes determining whether the entity is desirable or not, and said diagnosable attributes 
determining the reason of why the entity is desirable or not; 

an interface for communication between a user and said computer system, 
the user inputting into said computer system a plurality of selective conditions through said 
interface, and 

means in said computer system for computing optimal data association rules 
for said data to be diagnosed, based on said selective conditions, 

said selective conditions being characterized in combination, by a 
confidence, support and simplicity of said selective conditions. 
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